Improving out of vocabulary words recognition accuracy for an end-to-end Russian speech recognition system
Annotation
Automatic Speech Recognition (ASR) systems are experiencing an active introduction into our daily lives, simplifying the way we interact with electronic devices. The advent of end-to-end approaches has only accelerated this process. However, the constant evolution and a high degree of inflection of the Russian language lead to the problem of recognizing new words outside the vocabulary (Out Of Vocabulary, OOV) because they did not take part in the training process of the ASR system. In such a case, the ASR model tends to predict the most similar word from the training data which leads to a recognition error. This is especially true for ASR models that use decoding based on a Weighted Finite State Transducer (WFST), since they are obviously limited by the list of vocabulary words that may appear as a result of recognition. In this paper, this problem is investigated on the basis of an open data set of the Russian language (common voice) and an integrated ASR system using the WFST decoder. A method for retraining an integral ASR system based on the discriminative loss function MMI (maximum mutual information) and a method for decoding the integral model using a TG graph are proposed. Discriminative learning allows smoothing the probability distribution of acoustic class prediction, thus adding more variability in the recognition results. Decoding using the TG graph, in turn, is not limited to recognizing only vocabulary words and allows the use of a language model trained on a large amount of external text data. An eight-hour subset from the common voice base is used as a test set. The total number of OOV words in this test sample is 18.1 %. The results show that the use of the proposed methods allows to reduce the word recognition error (Word Error Rate, WER) by 3 % in absolute value relative to the standard method of decoding integral models (beam search), while maintaining the ability to recognize OOV words at a comparable level. The use of the proposed methods should improve the overall quality of recognition of ASR systems and make such systems more resistant to the recognition of new words that were not involved in the learning process.
Keywords
Постоянный URL
Articles in current issue
- Pulse recording of dynamic holograms in bismuth silicate crystal in a broad wavelength range
- Hybrid endoscope with television and multispectral image processing for the internal organs cancer early diagnostics
- Modelling of a composite waveguide holographic display
- Application of infrared spectroscopy methods in studying compositions for paper sizing
- Distribution optimization method of pixel density by surveillance area
- Evaluation and development of a method for compensating the positioning error of computer numeric control equipment
- Compensation of output external disturbances for a class of linear systems with control delay
- Luminescence technique for studying the growth of AgInS2 quantum dots
- Peculiarities of pulsed laser deposition of thin InGaAsN films in an active background gas atmosphere
- Determination of the electron distribution in thin barrier AlGaAs/GaAs superlattices by capacitance-voltage profiling
- Spectral and kinetic properties of silver sulfide quantum dots in an external electric field
- Influence of nano-sized horizontal inhomogeneities on surface profiling by means of XPS
- Organic light-emitting diodes with new dyes based on coumarin
- Fabrication and characterization of hybrid composite of Al6082/SiC/rice husk powder using friction stir processing
- A multi-path secure routing for the detection of node capturing attack in wireless sensor network
- A method for documenting architectural solutions of computing platforms
- Method for monitoring the state of elements of cyber-physical systems based on time series analysis
- Application of the text wave model to the sentiment analysis problem
- Automated evaluation of ECG parameters during the COVID-19 pandemic
- Multi-agent adaptive routing by multi-headattention-based twin agents using reinforcement learning
- Joint learning of agents and graph embeddings in a conveyor belt control problem
- Simulation of radiative transfer in gas-liquid foams
- The effect of signal-to-noise ratio value on the error in measuring acoustic emission parameters: statistical assessment
- Simulating the process of steady-state thermoreflectance for measuring the thermal conductivity of materials
- Modeling and simulation ofone- and two-row six-bladed ducted fans
- Differential-difference model of heat transfer in solids using the method of parametric identification